7 research outputs found

    Multi-Layer Mutually Reinforced Random Walk with Hidden Parameters for Improved Multi-Party Meeting Summarization

    No full text
    <p>This paper proposes an improved approach of summarization for spoken multi-party interaction, in which a multi-layer graph with hidden parameters is constructed. The graph includes utterance-to-utterance relation, utterance-to-parameter weight, and speaker-to-parameter weight. Each utterance and each speaker are represented as a node in the utterance-layer and speaker-layer of the graph respectively. We use terms/ topics as hidden parameters for estimating utterance-to-parameter and speaker-to-parameter weight, and compute topical similarity between utterances as the utterance-to-utterance relation. By within- and between-layer propagation in the graph, the scores from different layers can be mutually reinforced so that utterances can automatically share the scores with the utterances from the speakers who focus on similar terms/ topics. For both ASR output and manual transcripts, experiments confirmed the efficacy of including hidden parameters and involving speaker information in the multi-layer graph for summarization. We find that choosing latent topics as hidden parameters significantly reduces computational complexity and does not hurt the performance.</p

    Intra-Speaker Topic Modeling for Improved Multi-Party Meeting Summarization with Integrated Random Walk

    No full text
    <p>This paper proposes an improved approach of summarization for spoken multi-party interaction, in which integrated random walk is performed on a graph constructed with topical/lexical relations. Each utterance is represented as a node of the graph and the edge between two nodes is weighted by the similarity between the two utterances, where we use two types of edges, one from topical similarity evaluated by probabilistic latent semantic analysis (PLSA) and another from word overlap. We model intra-speaker topics by partially sharing the topics from the same speaker in the graph. We did experiments for ASR and manual transcripts. For ASR transcripts, experiments showed intra-speaker topic sharing and integrating topical/lexical relations can help include the important utterances.</p

    Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization

    No full text
    <p>This paper proposes an improved approach of summarization for spoken multi-party interaction, in which intra-speaker and inter-speaker topics are modeled in a graph constructed with topical relations. Each utterance is represented as a node of the graph and the edge between two nodes is weighted by the similarity between the two utterances, which is topical similarity evaluated by probabilistic latent semantic analysis (PLSA). We model intra-speaker topics by sharing the topics from the same speaker and inter-speaker topics by partially sharing the topics from the adjacent utterances based on temporal information. We did experiments for ASR and manual transcripts. For both transcripts, experiments showed combining intra-speaker and inter-speaker topic modeling can help include the important utterances to offer the improvement for summarization.</p

    Two-layer mutually reinforced random walk for improved multi-party meeting summarization

    No full text
    <p>This paper proposes an improved approach of summarization for spoken multi-party interaction, in which a two-layer graph with utterance-to-utterance, speaker-to-speaker, and speaker-to-utterance relations is constructed. Each utterance and each speaker are represented as a node in the utterance-layer and speaker-layer of the graph respectively, and the edge between two nodes is weighted by the similarity between the two utterances, the two speakers, or the utterance and the speaker. The relation between utterances is evaluated by lexical similarity via word overlap or topical similarity via probabilistic latent semantic analysis (PLSA). By within- and between-layer propagation in the graph, the scores from different layers can be mutually reinforced so that utterances can automatically share the scores with the utterances from the same speaker and similar utterances. For both ASR output and manual transcripts, experiments confirmed the efficacy of involving speaker information in the two-layer graph for summarization.</p

    Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk

    No full text
    <p>This paper presents a graph-based model that integrates prosodic features into an unsupervised speech summarization framework without any lexical information. In particular it builds on previous work using mutually reinforced random walks, in which a two-layer graph structure is used to select the most salient utterances of a conversation. The model consists of one layer of utterance nodes and another layer of prosody nodes. The random walk algorithm propagates scores between layers to use shared information for selecting utterance nodes with highest scores as summaries. A comparative evaluation of our prosody-based model against several baselines on a corpus of academic multi-party meetings reveals that it performs competitively on very short summaries, and better on longer summaries according to ROUGE scores as well as the average relevance of selected utterances.</p

    Unsupervised induction and filling of semantic slots for spoken dialogue systems using frame-semantic parsing

    No full text
    <p>Spoken dialogue systems typically use predefined semantic slots to parse users' natural language inputs into unified semantic representations. To define the slots, domain experts and professional annotators are often involved, and the cost can be expensive. In this paper, we ask the following question: given a collection of unlabeled raw audios, can we use the frame semantics theory to automatically induce and fill the semantic slots in an unsupervised fashion? To do this, we propose the use of a state-of-the-art frame-semantic parser, and a spectral clustering based slot ranking model that adapts the generic output of the parser to the target semantic space. Empirical experiments on a real-world spoken dialogue dataset show that the automatically induced semantic slots are in line with the reference slots created by domain experts: we observe a mean averaged precision of 69.36% using ASR-transcribed data. Our slot filling evaluations also indicate the promising future of this proposed approach.</p

    An empirical investigation of sparse log-linear models for improved dialogue act classification

    No full text
    <p>Previous work on dialogue act classification have primarily focused on dense generative and discriminative models. However, since the automatic speech recognition (ASR) outputs are often noisy, dense models might generate biased estimates and overfit to the training data. In this paper, we study sparse modeling approaches to improve dialogue act classification, since the sparse models maintain a compact feature space, which is robust to noise. To test this, we investigate various element-wise frequentist shrinkage models such as lasso, ridge, and elastic net, as well as structured sparsity models and a hierarchical sparsity model that embed the dependency structure and interaction among local features. In our experiments on a real-world dataset, when augmenting N-best word and phone level ASR hypotheses with confusion network features, our best sparse log-linear model obtains a relative improvement of 19.7% over a rule-based baseline, a 3.7% significant improvement over a traditional non-sparse log-linear model, and outperforms a state-of-the-art SVM model by 2.2%.</p
    corecore